Goto

Collaborating Authors

 model reliance




Supplementary Material: Model Class Reliance for Random Forests

Neural Information Processing Systems

Replication is facilitated through the provision of four hosted Python notebooks which replicate the paper results. When tested hosted runtimes were running Python 3.6.9 The packages developed as part of this work are discussed below and made available via the above notebooks. The code is written as an extension to the sklearn RandomForestRegressor and RandomForestClas-sifer classes. If running the notebooks on a hosted instance this will be automatically installed. The wrapper calls the R code from the lead author's github If running the notebooks on a hosted instance this will be automatically installed.


The Rashomon Importance Distribution: Getting RID of Unstable, Single Model-based Variable Importance (Supplementary Material)

Neural Information Processing Systems

The following lemma states that each level set of the quadratic loss surface is a hyper-ellipsoid, providing another useful tool for the propositions given in this section. Lemma 2. The level set of the quadratic loss at ε is a hyper-ellipsoid defined by: (θ θ




The Rashomon Importance Distribution: Getting RID of Unstable, Single Model-based Variable Importance

Donnelly, Jon, Katta, Srikar, Rudin, Cynthia, Browne, Edward P.

arXiv.org Machine Learning

Quantifying variable importance is essential for answering high-stakes questions in fields like genetics, public policy, and medicine. Current methods generally calculate variable importance for a given model trained on a given dataset. However, for a given dataset, there may be many models that explain the target outcome equally well; without accounting for all possible explanations, different researchers may arrive at many conflicting yet equally valid conclusions given the same data. Additionally, even when accounting for all possible explanations for a given dataset, these insights may not generalize because not all good explanations are stable across reasonable data perturbations. We propose a new variable importance framework that quantifies the importance of a variable across the set of all good models and is stable across the data distribution. Our framework is extremely flexible and can be integrated with most existing model classes and global variable importance metrics. We demonstrate through experiments that our framework recovers variable importance rankings for complex simulation setups where other methods fail. Further, we show that our framework accurately estimates the true importance of a variable for the underlying data distribution. We provide theoretical guarantees on the consistency and finite sample error rates for our estimator. Finally, we demonstrate its utility with a real-world case study exploring which genes are important for predicting HIV load in persons with HIV, highlighting an important gene that has not previously been studied in connection with HIV. Code is available at https://github.com/jdonnelly36/Rashomon_Importance_Distribution.


Variable Importance Clouds: A Way to Explore Variable Importance for the Set of Good Models

Dong, Jiayun, Rudin, Cynthia

arXiv.org Machine Learning

Variable importance is central to scientific studies, including the social sciences and causal inference, healthcare, and in other domains. However, current notions of variable importance are often tied to a specific predictive model. This is problematic: what if there were multiple well-performing predictive models, and a specific variable is important to some of them and not to others? In that case, we may not be able to tell from a single well-performing model whether a variable is always important in predicting the outcome. Rather than depending on variable importance for a single predictive model, we would like to explore variable importance for all approximately-equally-accurate predictive models. This work introduces the concept of a variable importance cloud, which maps every variable to its importance for every good predictive model. We show properties of the variable importance cloud and draw connections other areas of statistics. We introduce variable importance diagrams as a projection of the variable importance cloud into two dimensions for visualization purposes. Experiments with criminal justice and marketing data illustrate how variables can change dramatically in importance for approximately-equally-accurate predictive models.